AITopics | learning discrete distribution

Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$. Moreover, we show that any mechanism that only operates on the final aggregate should require a user complexity of the same order. We then propose a mechanism such that the number of users scales as $\tilde{\mathcal{O}}(k/(m\alpha^2) + k/\sqrt{m}\epsilon\alpha)$ and further show that it is nearly-optimal under certain regimes. Thus the privacy penalty is $\tilde{\Theta}(\sqrt{m})$ times smaller compared to the standard mechanisms. We also propose general techniques for obtaining lower bounds on restricted differentially private estimators and a lower bound on the total variation between binomial distributions, both of which might be of independent interest.

item-level privacy, learning discrete distribution, privacy, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.59)

Add feedback

Learning discrete distributions with infinite support

Neural Information Processing SystemsOct-2-2025, 12:56:52 GMT

In such a setting, distribution-free risk bounds are impossible, and the best one could hope for is a fully empirical data-dependent bound.

artificial intelligence, machine learning, theorem 2, (19 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States > Nebraska (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning discrete distributions with infinite support

Neural Information Processing SystemsOct-2-2025, 12:56:45 GMT

In such a setting, distribution-free risk bounds are impossible, and the best one could hope for is a fully empirical data-dependent bound.

artificial intelligence, machine learning, theorem 2, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States > Nebraska (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning discrete distributions: user vs item-level privacy

Neural Information Processing SystemsMay-27-2025, 16:13:36 GMT

Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over k symbols with user-level differential privacy. If each user has m samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be \mathcal{O}(k/(m\alpha 2) k/\epsilon\alpha) to achieve an \ell_1 distance of \alpha between the true and estimated distributions, with the privacy-induced penalty k/\epsilon\alpha independent of the number of samples per user m .

artificial intelligence, inductive learning, machine learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)

Add feedback

Review for NeurIPS paper: Learning discrete distributions: user vs item-level privacy

Neural Information Processing SystemsFeb-8-2025, 00:57:05 GMT

This paper examines user-level privacy in the context of learning discrete distributions. Near matching upper and lower bounds (with a corresponding mechanism) on the number of users m required for a desired level of total variation distance are established, while it is shown that natural baselines the Laplace and Gaussian mechanism achieve inferior performance by a factor of sqrt(m). User-level privacy is a variation on pure/approximate differential privacy in which a mechanism's response distribution must be indistinguishable not only to change of an individual item (record) but those items belonging to a user. The paper considers the cases of users contributing equal and unequal numbers of items. R3 highlights an important concern with the user-privacy definition required for the proposed mechanism: that the users' items are drawn i.i.d.

discrete distribution, learning discrete distribution, privacy, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

Learning discrete distributions: user vs item-level privacy

Neural Information Processing SystemsJan-14-2025, 22:25:29 GMT

Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over k symbols with user-level differential privacy. If each user has m samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be \mathcal{O}(k/(m\alpha 2) k/\epsilon\alpha) to achieve an \ell_1 distance of \alpha between the true and estimated distributions, with the privacy-induced penalty k/\epsilon\alpha independent of the number of samples per user m .

discrete distribution, item-level privacy, privacy, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)

Add feedback

Learning Discrete Distributions by Dequantization

Hoogeboom, Emiel, Cohen, Taco S., Tomczak, Jakub M.

arXiv.org Machine LearningJan-30-2020

Media is generally stored digitally and is therefore discrete. Many successful deep distribution models in deep learning learn a density, i.e., the distribution of a continuous random variable. Na\"ive optimization on discrete data leads to arbitrarily high likelihoods, and instead, it has become standard practice to add noise to datapoints. In this paper, we present a general framework for dequantization that captures existing methods as a special case. We derive two new dequantization objectives: importance-weighted (iw) dequantization and R\'enyi dequantization. In addition, we introduce autoregressive dequantization (ARD) for more flexible dequantization distributions. Empirically we find that iw and R\'enyi dequantization considerably improve performance for uniform dequantization distributions. ARD achieves a negative log-likelihood of 3.06 bits per dimension on CIFAR10, which to the best of our knowledge is state-of-the-art among distribution models that do not require autoregressive inverses for sampling.

dequantization, dequantization distribution, learning discrete distribution, (14 more...)

arXiv.org Machine Learning

2001.11235

Country: